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I .  INTRODUCTION 


An  axiom  of  many  simulation  studies  is  that  an  outcome  Y,  or  distribution 
of  outcomes  G(Y|x),  of  interest  can  be  computer  generated  using  as  input 

experimentally  derived  data  . 


A  commonly  encountered  procedure  is  one  in  which  a  set  of  experimental 
data  is  considered  to  be  a  random  sample  from  some  underlying  but  unknown 
distribution;  this  data  is  then  modeled  by  a  common  statistical  distribution 
to  provide  a  convenient  representation  (with  a  coincident  loss  of  information), 
and  is  then  used  as  the  basis  for  generating  additional  pseudo-observations 
(Monte  Carlo  values) .  The  intent  inherent  in  this  procedure  is  that  the 
pseudo-observations  maintain  the  statistical  structure  of  the  original  data 
set . 


The  intermediate  step  of  modeling  or  "fitting"  the  data  as  a  statistical 
distribution  is  sometimes  unnecessary  and  sometimes  nearly  impossible.  For 
example,  for  multimodal  data  or  multivariate  data,  it  is  usually  difficult 
and  often  unrealistic  to  attempt  to  characterize  the  data  analytically. 

Because  of  this  fact,  there  exists  little,  if  any,  guidance  for  the  practi¬ 
tioner  who  is  confronted  with  data  of  this  type.  Notice,  however,  that  to 
serve  as  input  for  simulation,  all  that  may  actually  be  required  is  to  provide 
pseudo-observations  that  exhibit  the  same  statistical  characteristics  as  the 
original  data  set,  with  no  real  necessity  to  formally  characterize  the 
underlying  distribution. 

It  is  in  response  to  this  observation  that  the  following  research  was 
initiated  and  algorithm  developed. 


II.  THE  ALGORITHM 


Let  us  consider  the  following  situation  addressed  by  Thompson  and 
Taylor We  have  a  random  sample  (X.}_.^  of  size  n  from  a  multivariate  dis¬ 
tribution  of  dimension  k,  and  we  want  to  generate  pseudorandom  vectors  from 
the  underlying,  but  unknown,  distribution  that  gave  rise  to  the  random 
sample.  Since  we  do  not  know,  and  usually  will  never  know,  the  form  of 
this  distribution,  our  attack  should  be  empirical.  We  shall  endeavor  to 
see  to  it  that  our  pseudorandom  vectors  look  very  much  like  those  in  the 
original  data  set.  In  so  doing,  we  will  maintain  the  essential  structural 
integrity  of  the  problem. 


We  now  direct  our  attention  to  the  mechanics  of  the  algorithm.  After 
carrying  out  a  rough  rescaling  to  account  for  differing  variances  that  may 
exist  among  the  k  variates,  we  select  at  random  one  of  the  n  data  points, 
say  Xj,  from  the  data  base  and  then  proceed  to  determine  its  m-1  nearest 

neighbors.  The  nearest  neighbors  are  determined  under  the  ordinary  Euclidean 


J.R.  Thompson  and  M.S.  Taylor,  "A  Data  Based  Random  Number  Generator  for  a 
Multivariate  Distribution,"  Proceedings  of  the  Twenty-Seventh  Conference  on 
the  Design  of  Experiments  in  Army  Research,  Development,  and  Testing  (1981). 
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metric.  The  value  of  m,  which  can  best  be  determined  after  perusal  of  the 
data,  will  depend  upon  the  sample  size  n  and  the  characteristics  of  the  data. 
A  conservative  estimate  would  be  to  choose  m  =  n/20. 


The  vectors  {X.}.,  are  now  coded  about  the  sample  mean  X  =  1/m  E  X. 

1  J=1  i 

to  yield  {X^}  =  {X_.  -  xK^j,  and  an  independent  random  sample  of  size  m  is 


generated  from  the  uniform  distribution  U (1/m  - 

Now  the  linear  combination 


V 


3 fm-1) 
2 


ra 


,  1/m 


3  (m- 1) 


m 


m 


X'  =  X  uoX 


Z=1 


l  l 


is  formed,  where  {u  }  m  is  the  random  sample  from  the  U(l/ra  -  A*,  1/m  +  A"  ) 

Z  SL — 1 

Finally  the  translation 


X  =  X'  +  X 

restores  the  relative  magnitude,  and  X  is  a  pseudorandom  vector  which  we 
propose  to  be  representative  of  the  multivariate  distribution  that  provided 

the  (X.)  A  . 

1  1=1 

To  obtain  the  next  pseudorandom  vector  we  randomly  select  another  of 
the  n  data  points  and  proceed  as  above. 

We  will  now  attempt  to  advance  the  algorithm  by  considering  the  mathe¬ 
matics  that  suggests  the  mechanics  that  we  have  just  outlined.  Consider 
the  distribution  of  X1  and  its  m-1  nearest  neighbors: 

{ (x  x  x,  ) ' }A  =  {X„}„m, .  Let  us  suppose  that  this  "truncated  set" 

U’  2Z>  kJr  ;2.=1  v  H!i=l  ^ 

of  random  observations  has  mean  vector  y  and  covariance  matrix  a.  Let 

{»}  "  be  an  independent  random  sample  from  the  uniform  distribution 

X/  X/  — 

U(l/m  -  A*,  1/m  +  A"  ) .  Then,  E(u^)  =  1/m,  Var(u^)  =  (m-l)/m2,  and 
Cov(u^,  Uj)  =0,  for  i  /  j. 

Forming  the  linear  combination 


Z 


m 

1 

£=1 


u„x„ 
Z  l 
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we  have,  for  the  rth  component  zr  =  +  u2xr2  + 

relations 


.  +  u  x  ,  the  following 
m  rm 


E(z  )  =  m  •  1/m  •  ur  =  Vr, 


2  2 
Var(zr)  =  ar  +  (m-l)/m  •  ur> 


Cov(zr,zs)  =  ars  + 


(m-l)/m 


W 


Clearly,  if  the  mean  vector  of  X  was  p  =  (0,0,  . ..,0)  ,  then  the  mean  vector 
and  covariance  matrix  of  Z  would  be  identical  to  those  of  X.  In  the  less 
idealized  situation  with  which  we  are  confronted,  the  translation  to  the 
sample  mean  of  the  nearest  neighbor  cloud  should  result  in  the  pseudo-observa¬ 
tion  having  very  nearly  the  same  mean  and  covariance  structure  as  that  of  the 
(truncated)  distribution  of  the  points  in  the  nearest  neighbor  cloud,  a  con¬ 
jecture  substantiated  in  many  actual  cases  that  have  been  considered.  For  m 
moderately  large,  our  algorithm  essentially  samples  from  n  Gaussian  distribu¬ 
tions  with  the  means  and  covariance  matrices  corresponding  to  those  of  the 
n  m-nearest -neighbor  clouds. 


III.  EXAMPLES 


For  a  substantial  test  case,  we  considered  a  mixture  of  three  bivariate 
normal  distributions.  The  first  (N^)  has  mean  vector  (_*)  and  covariance 


matrix  ( 


-1/2 


-1/2  -2 

j  ) ;  the  second  (N2)  has  mean  vector  (  )  and  covariance  matrix 


11/2  2 
(l/2  j  );  and  the  third  (N^)  has  mean  vector  (^2)  and  covariance  matrix 


Ll/10 


.  The  corresponding  mixing  scalars  are  ot^  =  1/2,  a2  =  1/3,  and 


dj  =  1/6,  respectively.  To  establish  a  data  base,  a  sample  of  eighty-five 

points  was  generated  from  this  distribution  via  Monte  Carlo  simulation,  and 
appears  in  Figure  1;  a  sample  of  eighty-five  pseudorandom  values  was  then 
produced  by  the  algorithm,  and  the  combined  sample  is  shown  in  Figure  2. 


Notice  that  the  structure  of  the  data  is  maintained  in  that  the  modes 
are  preserved;  the  algorithm  has  not  attempted  to  fill  in  gaps  where  gaps 
belong;  the  algorithm  has,  however,  generated  some  points  outside  the  boundary 
of  the  convex  hull  of  the  data  base,  all  of  which  are  desirable  properties. 
These  observations  lend  credence  to  the  term  "structural  integrity"  mentioned 
previously. 


An  application  of  the  algorithm  to  a  real  world  data  set  is  summarized 
in  Figures  3  and  4.  In  Figure  3,  a  two-dimensional  marginal  of  a  set  of  973 
four-dimensional  behind  armor  debris  measurements  is  portrayed;  in  Figure  4 
973  simulated  data  points  are  produced  by  our  procedure.  Once  again,  the 
salient  features  of  the  data  set  are  preserved. 
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Figure  2.  Combined  sample:  Data  base  and  pseudo-observations. 


BEHIND  RRMOR  DRTfi 


BEHINO  RRMOR  DRTR 


Figure  4.  Simulated  behind  armor  debris 


LEGEND 
O -  DRTfl 


0.01  0.02 
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Figure  5.  Data  base  for  MLRS  bomblets. 


Figure  6.  Simulated  MLRS  bomblet  data. 


Figure  7.  Combined  sample:  Data  base  and  pseudo-observations. 


Figures  5  through  7  display  the  results  of  applying  the  algorithm  to  a 
data  base  of  modest  size.  Here  a  set  of  49  bivariate  measurements  on  MLRS 
bomblets  shown  in  Figure  5  was  supplemented  by  an  additional  160  pseudo¬ 
observations  (Figure  6),  with  the  results  portrayed  in  Figure  7. 

The  FORTRAN  program  of  the  algorithm  appears  in  Appendix  A.  This  program 
as  listed  produces  plots  using  the  IMSL  Library;  Figures  1-7  are  plots  produced 
by  DISSPLA. 


IV.  CONCLUSIONS 

We  have  demonstrated  a  means  of  empirical  random  number  generation 
based  on  a  sample  of  observations  of  a  random  variable  X.  No  estimation  of 
the  underlying  density  is  required.  And,  because  of  the  local  nature  of  the 
generation  scheme,  it  is  essentially  free  of  assumptions  on  the  underlying 
density  of  X.  Naturally,  any  attempt  to  use  this  algorithm  for  generating 
bona  fide  new  observations  using  the  computer  rather  than  producing  real 
world  data  would  be  unwise.  Rather,  the  algorithm  operates  somewhat  like  a 

smooth  interpolator -  highly  dependent  on  the  quality  of  the  data  points 

on  which  it  is  based.  It  gives  us  a  means  of  avoiding  nonrobust  conclusions 
due  to  "holes"  in  the  data  set  at  important  points  of  the  simulation  model. 
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APPENDIX  A.  COMPUTER  PROGRAM 


STEP  1.  CREATION  OF  THE  DATA  FILE 

Data  should  be  created  and  stored  as  an  MFA  permanent  file.  This  file  should 
contain  the  following  information.  Note:  Let  pi  denote  program  limitations. 

CARD  1.  5  INPUTS  FORMAT  COLUMNS 


1.  NUMDAT  -  number  of  input  data  points  14  1-4 

pi.  5  <  NUMDAT  <  1000 

2.  NUMGEN  -  number  of  pseudopoints  generated  14  6-9 

pi.  5  <  NUMGEN  <  1000 

3.  IDIM  -  dimension  n  of  n-space  data  II  11 

pi.  1  <  IDIM  <  8 

4.  NRANDM  -  random  number  seed  13  16-18 

pi .  1  <  NRANDM  <  999 

5.  NUMPLT  -  number  of  plots  requested  12  21-22 

Note  -  all  possible  2-d  plots  (NUMPLT  =  -1) 

-  no  plots  (NUMPLT  =  0) 


-  r  plots  (NUMPLT  =  r) 

-  NUMPLT  plots  will  be  generated  for  both  data  and  pseudo-observations, 
pi.  -1  <  NUMPLT  <  IDIM(IDIM-l)/2 

CARD  2  5  CARD  3 

If  NUMPLT  =  r,  request  plots  of  variables  vs.  Y.  by  indicating  j  in  Card  2 
and  the  corresponding  i  in  Card  3.  1 

Column  1  2  ...  r 

Card (2)  Y2  ...  Yf 

Card (3)  Xj  X2  ...  Xr 

If  NUMPLT  =  0  or  -1,  cards  2  and  3  should  not  be  included,  otherwise  information 
on  these  cards  would  be  used  as  data. 
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Card (4) 

,  Card(5)  ... 

Card (NUMDAT) 

These  cards 
consist  of  a 

should  contain  data  to  be  read  in  with  F10.3  format, 
maximum  of  8  variables  X^  -  Xg. 

Data  may 

Columns 

1  -  10 

11-20 

71  -  80 

Card(4) 

X^  Data 

X2  Data 

Xg  Data 

Card (5) 

Xj  Data 

X2  Data 

Xg  Data 

Card (NUMDAT) 

X.^  Data 

X2  Data 

Xg  Data 

STEP  2 

Before  the  program  can  be  run,  the  data  file  must  be  made  accessible  to  the 
MFZ  with  (PERMIT,  PFN,  MFZ) . 

STEP  3 

To  run  the  program  create  and  submit  the  following  3  card  MFZ  job. 

JOBNAME,  STMFZ ,  T100. 

ACCOUNT,  XXXXXXX. 

BEGIN,  DBRNG,  DBRNG ,  PFI  =  _ ,  PFO  =  _ ,  UN  =  _ ,  RJE  =  RJEXXXX. 

Where 

PFI  =  file  name  under  which  the  input  data  file  is  stored, 

PFO  =  file  name  under  which  the  pseudo  data  is  to  be  stored, 

UN  =  user  name  identification  for  above  two  files, 

XXXX  =  a  4  digit  code  designating  a  particular  RJE,  as  the  output  device. 

If  omitted,  then  the  central  site  will  serve  as  the  output  destination. 


20 


ooo  oooo  ooo  ooo  ooo 


C  HIS  »R0GRaM  PRODUCES  PSEUDORANDOM  OBSERVATIONS  FROM  REAL 
C  OATA  FOR  UNIVARIATE  AND  ,1 JLT  t  V  AR  I A  TE  CASES.  THE  PSEUDO- 
C  RAN00M  OBSERVATIONS  WILL  NAINTAIN  THE  CHARACTERISTICS  OF 
C  THE  REAL  DATA  WITHOUT  ANT  DISTRIBUTION  ASSUMPTION  ON  THE 
C  °0PU L A T ION  FRON  WHICH  THE  REAL  DATA  CANE.  AN  EXAMPLE  OF 
C  PROPER  'JSE  OF  THIS  PROGRAM  WOULD  BE  IN  THE  CREATION  OF  PSEUDO- 
C  RANOON  OBSERVATIONS  =0R  INPUT  TO  A  COHPUTER  SIMULATION 
C  HODEL. 

C 

C  CAJTION!  THIS  PROGRAM  DOES  NOT  PRODUCE  REAL  OaTa  and  PRODUCED 
C  PSEU00R4N00M  OBSERVATIONS  SHOULD  NOT  BE  UScO  AS  SUCH. 

C 

C  N'J.MOAT-  NUHBER  OF  INPUT  OATA  POINTS 

C  NUNGEN-  NUMBER  OF  PSEJ0QP3INTS  TO  BE  GENERATED 

C  IDIM-  NUMBER  OF  VARIABLES  IN  INPUT  OATA  SET 

C  NRANON-  RANOON  NUMBER  SEED 

C  NUMPLT-  NUMBER  OF  PLOTS  REQUESTED 

C  DATA-  MATRIX  HOLOING  INPUT  DATA  SET 
C  NPLT-  MATRIX  HOLOING  PLOT  REQUESTS 
C 

PROGRAM  D9RNGdNPUT,0JTPUT,TAPEb«0UTPUT»  TAPE 3,  TAPER) 

C  - 

DIMENSION  DATA!  1000,  ID ),®SEUDO (1000, 10 ) , NP LT ( l, AO ) , ZMIN ( 10 ) 
DIMENSION  ZMAXdO) 

READ  ANO  WRITE  INITIAL  IN»UT  VARIABLES 

RSAD{8, 1 00  0)N'JMDAT, NJMGE.N, IDIM, NR  ANDM,  NUMPLT 
VR I T  £  (  S»  20  30 ) 

WRITE  (3, 2040INUMD4T,  NUNGE N, I D I M, NR ANDH, N UNPL T 

CHECK  FOR  INVALID  VALUES  IF  INITIAL  INPUT  VARIABLES 

CALL  CHECK (NJMO AT, NUNS EN, IDIM, NR ANDH, NUMPLT, NC HECK) 

Ic  (  NC HE  C  K.  E Q.  1 )  GO  TO  RO 
IF(NUHPLT.GT.O)GO  TO  10 

ESTABLISH  NPLT  PQR  ALL  POSSIBLE  IOIMdOIM-1)  fl  PLOTS 

CALL  SETPLTI NPLT,  IDIM) 

GO  TO  40 

USER  ESTABLISHES  OESIREO  »LOTS 

T  VALUES  FIRST  CARO,  CORRESPONDING  X  VALUES  SECOND  CARD 
10  DO  20  K *  1 » 2 

RE  AO  (9.  1010  MNP  LTU,  L>,L*  1,  NUMPLT) 

2D  CONTINUE 

IF  USER  DEFINED,  WRITE  PLOTS  REQUESTED 

WRITER.  20501 
00  30  K  » 1* 2 

WRITE  lb,  2070)  (NPLTU.L),L»1»NUNPLT> 

30  CONTINUE 
C 

C  READ  REAL  DATA  ANO  WRITE  FIRST  FIVE  POINTS 
C 

40  DO  50  I«l>  NUMOAT 

RcAD  (8,1020)  (DAT  Ad,  J)»J»1,IDIH) 

50  CONTINUE 

VRITEtb,  20  %f.  I 
00  SO  I«l*  5 

WRITE (6,  2D  20  I  (OATA (I, j),U«l, IDIM) 


BO  CONTINUE 
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C  VARY  r  HE  RANDOM  NUM3ER  SEQUENCE  USING  INPUT  VARIABLE  NR  ANON 
C 

00  70  <» 1#  NRANON 
RM«RANF (0) 

70  CONTINUE 
C 

C  WRITE  HEADER  FOR  DUT»JT 

V* 

WR  I TE ! b »  200C ) 

OETERNINE  THE  CORRELATION  MATRIX,  mean  and  VARIANCES  FOR  REAL  OaTA 

CALL  CORRELtOATA.IDIM.NdNDAT) 

C 

C  SCALE  DATA  SO  THAT  EACH  VARIABLE  WILL  CARRY  EQUAL  WEIGHT  IN 
C  THE  NEIGHBORHOOD  SELECTION  PROCESS 
C 

CALL  SCALE  <  DATA. NJND AT. ID  IN. Z MIN, Z MAX) 

C 

C  GENERATE  NJMGEN  PSEU03DATA  POINTS 
C 

CALL  GNtRATlOATA.NUMDAT.NUNSEN.IDIM,  PSEJDO) 

C 

C  RESCALE  THE  DATA  AND  THE  CORRESPONDING  PSEUDODATA  TO  THEIR 
C  ORIGINAL  NAGNITUOES 
C 

CALL  RESCALIDATA.NUNOAT, IOIN.ZMIN, ZNAX) 

CALL  RESCAL(PSEUDO.NJNGEN.IDIN.ZNIN.ZNAX) 

C 

C  WRITE  HEADER  FOR  OUT»'JT 
C 

WR I TE | b« 2010) 

C 

C  OETERNINE  THE  CORRELATION  MATRIX,  MEAN  AND  VARIANCES  FOR  PSEUOO- 
C  DATA 
C 

CALL  CORRELtPSEUOO.IOIN.NUMGEN) 

C 

C  WRITE  THE  PSEUDORANDOM  OBSERVATIONS  ONTO  A  PERMANENT 
C  FILE  (  PSEUDO  ) 

C 

DO  BO  J-l.NUNGEN 

VRITE<9,2020><P$EdOO!J»L)»L*l,I0IM) 

80  CONTINUE 
C 

C  CALL  PLOT  ROUTINE  IF  REQUESTED. 

C 

IDUNY-I0IN*UDtN-l)/2 
IF(NUMPLT.EQ.-1 )NJMPLT*IDUNY 
IF1NUMPLT.LE.OJGO  TO  70 

CALL  PLOT!  PSEUDO, OATA, ZM  I N,  Z  M  A  X,  NPL  T,  NUN  DAT,  NU  MG  E  N,  NUN  PL  T  > 

90  COST INJE 

WR ITE lb, 2380) 

STOP 

1000  FORMAT!  I  A, IX.  I  A, IX. II. AX, I3.2X.I2) 
lGlO  FQRMAT136I1) 

1020  FORMAT! 8F10. 3) 

2000  FORMAT! 1H1.20X. 'CORRELATIONS,  MEANS  AND  VARIANCES  OF  INPUT  OATA  SE 
♦T',/1 

7010  FORMAT!/// »23X> 'CORRELATIONS.  MEANS  AND  VARIANCES  JF  »SEU0O  DATA  S 
♦  ET* ,  /  ) 

2020  FORMAT! AX. 8(F12. 3.  3X  )  ) 


20  30  FORMAT!  1HI  ,  AX,  'NJNDAT* ,2X, • NJMGEN' ,3X, • I  DIM' ,3X,  ' NRANOM • , 
♦  2X, • NUMPLT 1 ) 

7  JAO  FORMAT! / , a  X , 5  (  I A, 9 X >  ) 

2050  FORMAT!///, 5X1 PLOTS  REQUESTED  Y  OVER  X',/) 

2w60  FORMAT! ///»  5X»  •  F  nor  PUE  OATA  POINTS'./) 

2070  FORMAT! 5X, 381? ) 

2080  FOR.NATtHl,  '  '  ) 

END 
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c 

c 

c 

C  THIS  SUBROUTINE  CHEC<S  He  INITIAL  INPUT  DATA  FIR  VALUES 
C  WHICH  V I L  L  CAUSE  THE  PROGRAM  TO  FAIL.  FOR  EXAMPLE  NUMDAT 
C  CAN  SE  A  MAXIMUM  OF  IjOU  AS  TH£  DIMENSION  ST  AT  E1ENT  ONLY 
C  ALLOWS  FOR  1000  DATA  POINTS.  IF  AN  INCORRECT  VALUE  IS 
C  OETECTEO,  NCHECK  WILL  BE  SET  TO  1.  WHEN  RETURNED  AS  1 
C  THE  PROGRAM  WILL  STOP.  IF  RETURNED  AS  0  THE  PROGRAM  WILL 
C  CONTINUE  NORMALLY. 

C 

SUBROUTINE  CHEC K ( N JMO A T,  NUHGcN , I DI N, NR  AN DN,N UN  PL T* NCHE CK ) 

C - 

IF!NUHDAT.GT. 1003. OR. NUMDAT. LT.5IG0  TO  100 

IF!NUM3EN.LT.5.0R«NUM3EN.GT.10CC)GO  TO  200 

IF(IDIM.LT.1.0R.IDIH.3T.8)GO  TO  3GU 

IF(NRANDM.LE.D) GO  TO  400 

K*IDIM*I IDIM-1 1  n 

IF!NUM?LT.GT.K)GO  TO  BOO 

IF(NUNPLT.LT.-i ) 30  TO  500 

IF( IOIM.eQ.l.AND.NUMPLT.£Q.-l)GO  TO  50l 

nchecko 

return 

100  WR I Tc (6. 2000 ) 

NCH£CK«1 

RETURN 

200  WRITE(6»2010) 

NCHECK-1 

RETURN 

3uO  WRITE (6. 20201 
NC  HECK* 1 
RETURN 

400  WRITE  <5,  2030 
NC  H  EC  K* 1 
RETURN 

500  WRIT£<6.2040) 

NCH6CK-1 

RETURN 

2300  F0RMAT(//»5X*' INVALID  NUMBER  OF  INPUT  DATA  POINTS') 

201 J  FORMAT!//. 5X» ' INVALID  NUMBER  OF  PSEUDO  DATA  POINTS') 

2020  FORMAT! //. 5X. ' INVALIO  DIMENSION  N  OF  N-SPACE  DATA') 

2C30  FORMAT!//. 5X. 'INVALID  SEED') 

2043  FORMAT!//. 5X, 'INVALID  NUMBER  OF  2D  PLOTS  FOR  DIMENSION  SPECIFIED') 
END 
C 
C 
C 

C  THIS  SUBROUTINE  INTIALIZcj  THE  NPLT  MATRIX  SO  THAT  ALL  POSSIBLE 
C  2-D  PLOT  COMBINATIONS  ARE  CONSIDERED.  THERE  IS  A  TOTAL  OF 
C  IOIM! I0IM-1) /2  PLOTS  WHICH  COULD  BE  MADE.  IN  MAIN  IF  NUMPLT-C.  NO 
C  PLOTS  WILL  BE  MAOE.  IF  NUMPLT— 1.  ALL  PLOTS  WILL  BE  MADE. 

C 

SUBROUTINE  SE TP L T ( NP LT » I D IN) 

C - 

DIMENSION  NPLT! 2,40) 

K  «  1 


II-IDIM-1 
DO  20  1*1.  II 
JJ«I+1 

DO  ID  J*JJ»  IOIM 
NPLT(l.K).  I 
NPLTI2.K1* J 
K-K  +  l 

10  COST INJE 
20  CONTINUE 
RETURN 
END 
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C  THIS  SUBROUTINE  SCALES  THE  DATA  SC  THAT  EACH  VARIABLE  WILL 
C  CARRY  E  DUAL  WEIGHT  IS  THE  HE IGHB3RH0QD  SELEC  TIQN  PROCESS.  THE 
C  SCALED  DATA  WILL  THEN  BE  RETURNED  TO  MAIN.  THE  PROCESS  JSED 
C  IS  (  X(I)-MIN<X(I)  >  )  t  RANGEIXHIJ)  FOR  EACH  VARIABLE. 

C 

SUBROUTINE  SCALE(TOATA,NSORT, IOIM, ZMIN.ZMAX) 

c - 

DI HE  NS  I  ON  TDATA< 1)00,101, ZMIN( 10). Z MAX <1C> 

I  NP'JT  FOR  BUBBLE  SORT 

NT0P*NS0RT-1 
C 

C  LOOP  WHICH  SORTS  ON  EACH  VARIABLE  (  NRANK  )  AND  THEN 
C  ESTABLISHES  ITS  MINIMJH  AH  D  MAXIMUM  (  ZKIN  )  AND  (  ZNAX  > 

C  R£S*ECTIVELY 
C 

DO  10  1*1,  IDIN 
NRANK» I 

CALL  SORT( TDAT  A, NSORT ,NTOP, NRANK, IDIN) 

ZNI N < I ) « TO ATA 1 1 »  I ) 

ZNAX( I) «TOATA<  NSORT,  I  ) 

10  COST  INJE 
C 

C  LOOP  WHICH  PERFORNS  THE  ABOVE  MENTIONED  TRANSFORMATION 
C 

DO  30  J*l, NSORT 
DO  20  X* 1,  IO1 1 

TO ATA  (J  ,  K) ■ (TOATAH  J,  K)-ZNIN<iO  )/(ZMAX(K) -ZMIN( K)  ) 

20  CONTINJE 
3C  CONTINUE 
RETURN 
END 
C 
C 
C 

C  THIS  ROUTINE  SORTS  THE  OATA  MATRIX  ON  POSITION  NRAN< 

C  THE  SORT  USED  IS  A  COMMON  BUBBLE  SORT  WHICH  WILL  ESTABLISH 
C  THE  FIRST  STOP  POINTS  FROM  SMALLEST  TO  LARGEST  -  WHERE 
C  SMALLEST  T]  LARGEST  IS  DETERMINED  BY  POSITIJN  NRAHK.  NOTE 
C  THAT  WHEN  AN  EXCHANGE  TAKES  PLACE  THE  ENTIRE  ROW  VECTOR. 

C  SOME  POINT  (W,X,Y....,D)  IS  EXCHANGED.  NOTE  ALSO  THAT  0 
C  REPRESENTS  THE  DISTANCE  S3UARED  COMPUTED  IN  EUCLID  AND  STORED 
C  IN  POSITION  IDIST. 

C 

C 

C 

SUBROUTINE  SORT  (SO ATA,  NSORT,  NT  OP, NRANK, I  DIM) 

DIMENSION  SOATAl 1003, 10) 


IOIST.IDIM+I 

C 

C  TAKE  THE  FIRST  I'TH  VALUE  AND  COMPARE  IT  TO  THE  I+1'TH 
C  VALUE.  IF  THE  I'TH  VALUE  IS  SMALLER,  EXCHANGE  IT  »ITH 
C  THE  I+l'TH  VALUE  SO  THAT  THE  I'TH  VALUE  IS  THEN  SMALLER. 
C  THEN  COMPARE  THE  I'TH  VALJe  WITH  THE  I*2'TH  VALUE  AND 
C  SO  ON. 

c 

DO  30  1*1, NT 0* 

L  *  I  +  1 

DO  20  J-L. NSORT 

IF ( SOAT A ( I .NRANK I. LT. SOATa ( J, NRANK ) ) GO  TO  2C 
DO  10  <*1,  IDIRT 
TEM»*SDATA<  1,0 
SDaTACI.O  *  SO  AT  A  (  J  ,  K  ) 

SOATA(J,K)-TEMP 
10  CONTINJE 
20  CONTINJE 
30  CONTINJE 
RETURN 
END 
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THIS  SUBROUTINE  DOES  ME  ACTUAL  GENERATION  OF  THE  PSEUDORANDOM 
OBSERVATIONS*  AND  RETJRNS  THEN  IN  A  MATRIX  (  PSEUDO  ).  THE 
ALGORITHM  JSED  WAS  DEVELOPED  BY  DR.  JIM  THOMPSON  OF  RICE  UNIVERSITY 
ANO  DR.  MALCOLM  TATLOR  OF  3RL. 

SUBROUTINE  GNE  RATI  DA T A ,NUMDA T*  NUMGEN *  I DI M, PSEUDO  > 


DIMENSION  DAT4(1000.10)»PSEUD0U000»10>*  AVER  AG  (  iC  ) 
DIMENSION  TRANSD(ZS.IO) 

INITIALIZE  THE  MATRIX  PSEJDO  TO  ZERO. 

DO  20  L  *  1*  NUMGEN 
DO  10  K»l*  I  DI M 
PS  E  UDC( L  *  K ) «0. 

1J  CONTINUE 
20  CONTINUE 

ESTABLISH  THE  SIZE  OF  THE  NEIGHBORHOOD  OF  NEAREST  POINTS  TO 
BE  USED  IN  A  LINEAR  COMBINATION. 

NEIGH B«INT(FLOAT(NUMQAT)/20. » 

IFINEIGH8.lt. 5INEIGH8-5 
IF (NEIGH8. ST. 20 ) HE I3HB*20 

I DIST  MARKS  THE  COLUMN  WHERE  THE  EUCLIDEAN  DISTANCE  SQUARED 
WILL  BE  STORED. 

ID  I ST*ID IM*1 

INITIALIZE  THOSE  DISTANCES  AS  ZERO  TO  PREVENT  COMPUTER  ERROR 

00  30  J-l.NUMOAT 
DATA!  J.  IDISTAO. 

30  CONTINJE 

WEIGHT  IS  THE  WEIGHTING  FACTOR  TO  BE  USED  IN  CALCULATING  THE’ 
MEAN  OF  THE  Nc IGHB  NEAREST  NEIGHBORS.  IT  ALSO  SERVES  AS  THE 
MEAN  OF  THE  SPECIAL  UNIFORM  DISTRIBUTION  USED  IN  ThE  LINEAR 
COMBINATION. 


C 

WE IGHT«1 .1 FLOAT ( NE I5H3  ) 

C 

C  UN AD J 1  HELPS  DEFINE  THE  UNIFORM  DISTRIBUTION  WITH  MEAN 
C  WEIGHT. 

C 

UNA0J1«I3.*(FL0AT(NEIGH3)-1. )/(cLOaT (NEIGHB)**2. ) )  **  «  5 

C 

C  THE  FOLLOWING  LOQ°  GENERATES  NUMGEN  PSEUDORANDOM 
C  OBSERVATIONS. 

C 

DJ  120  JJJ-1. NUMGEN 
C 

C  INITIALIZE  THE  AVERAG  ARRAY  EACH  TIME  4  NEW  POINT  IS  CHOSEN. 

C 

DO  AO  HS£T«1,IDIM 
AVERAGt  NSET»*D. 

40  CONTINJE 
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c 

C  RANOONLY  PICK  A  OATA  POINT  (  XCENTR  )  AROUND  WHICH  A 
C  A  NEIGHBORHOOD  WILL  3c  FDRNSO. 

C 

RN-RANFIO) 

KCENTR*  INTI  RN*FL  DAK  NUMD  ATM  *1 

r 

C  ESTABLISH  THE  EUCLIOEAN  DISTANCE  SOJARED  OF  ALL  POINTS 
C  fuqm  KCENTR. 

c 

CALL  EJCLIDIDATA.HJNDAT. IDIN# KCENTR ) 

L# 

C  SORT  THE  POINTS  ON  THEIR  EUCLIOEAN  DISTANCE  FROM 
C  SMALLEST  TO  LARGEST.  IN  THIS  FASHION  THE  NEIGHB  NEAREST 
C  NEIGHBORS  WILL  BE  CHOSEN. 

C 

CALL  $0«T<0ATA.NUN0AT,NEIGH3>IDIST,IDIM> 

w 

C  CONFUTE  THE  AVERAGE  Oc  X,Y,Z,...  IN  <X,Y,Z#...)  OF  KCENTR 
C  AND  ITS  NEAREST  NEIGHBORS 
C 

DO  t>C  1*1#  NEIGHB 
DO  50  1*1#  IDIN 

AVERAGUJ*  AVERAGI  J  >*OATA(  I#  J)*Wfc  IGHT 
5J  CONTINJE 
60  CONTINJE 
C 

C  CREATE  A  TRANSLATED  DATA  SET  (  TRANSD  )  TO  HE  USED  IN 
C  THE  CREATION  OF  ONE  *3INT. 

C 

03  00  N*l» NEIGHB 
DO  70  L*l. IDIN 

TRANSO<M,L)«DATA(i#U- AVER  AGIO 
70  CONTINJE 
80  CONTINJE 

C  BEGIN  THE  LOO®  WHICH  CREATES  A  NEW  **01  NT  BY  TAKING 
C  a  LINEAR  COMBINATION  IF  THE  TRANSLATED  DATA. 

C 

DO  IOC  I-1.NSISH8 
C 

C  ESTABLISH  A  RANDOM  NU1BER  FROM  THE  SPECIAL  JNIFJR1 
C  DISTRIBUTION  TO  BE  MULTIRLIEO  BY  ONE  DATA  VECTOR 
C  f  X# Y,z,  ..  .  )  . 

c 

RN  »R ANF (0) 


RN*RN*Z*UNAOJ l* ( WE IGHT-UNADJ1  ) 

C 

C  LOO*  WHICH  AODS  THE  TRANSFORMED  VECTORS  TO  CREATE 
C  ONE  NEW  POINT 
C 

00  90  1*1# IDIN 

PStUDOI  J JJ»  J)*PS : JDO( JJJ# J)*RN*TRANSD(I»  J) 

90  CONTINJE 
100  CONTINJE 
C 

C  LOO®  WHICH  AODS  BACK  IN  THE  AVERAGE  OF  THE  NEIGHBORHOOD 
C  WHICH  WAS  TAKEN  AWAY  FROM  <  TRANSD  ) 

C 

DO  110  L*1»IDI1 

PSE  JOGC  J  JJ.L)  -  PSE  JD3U  JJ  .  L»*AVERAG(L  » 

110  CONTINJE 
120  CONTINJE 
RETURN 

END 
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c 

c 

c 

C  THIS  SUBRCJTINE  COMPUTES  THE  MEAN  VECTOR,  VARIANCE  VECTOR 
C  AND  CORRELATION  M4TRIX  OF  ANT  NATRIX  SUBMITTED  TO  IT.  IT  THEN 
C  VRITES  THIS  I, NFOR NATION  ON  HARD  COPY. 

C 

SUBROUTINE  CORRE  l  (  CD  A  T  A,  I D  1 .1,  NUMC  OR  ) 

C - 

DIMENSION  COAT  At  1300,10) ,SUMX I 1 1C), SUM  XT (10,10), VAR< ID), LINE (3) 
DIMENSION  AVR(10I,C0RR(10,10),LINE1( 40), LINE2( 18) 

C 

C  INSURES  OUTPUT  NEATNESS 
C 

DATA  LINE/*X1*,»X2',,X3',,XH*,'X5«,*X6',,X7»,'X8»/ 

DATA  LINEa^C',*!','*'*^*, •£•,•!•, 'A', 'TS'I'j'ON’N*,'  », 

♦  •M' > ' 4' > 'T*»'R* , • I  •*  •  X«/ 

DO  10  1-1,40 
LINEim-'  • 

10  CONTINUE 

INITIALIZE  THE  SJN  OF  X(I)  AND  THE  SUM  OF  X(I)*X(J)  TO  ZERO. 

DO  00  l-WIOIM 
SUNXI M ) • 0 . 

DO  20  MM«1,I0IM 
SJMXYtM, MM ).o. 

20  CONTINUE 
30  CONTINUE 

COMPUTE  THE  SUM  OF  X(()  AND  THE  SUM  OF  X(I)*X(J). 

DO  60  1*1, NUNCOR 
DO  50  J- 1.  1011 

SJMXI  (J  ) -SUHXIt  J  JKDATAt  I,  J) 

KR.J 

DO  40  <*K<,IDIM 

SJMXf U»KI «SUMXY(J  »K)*C0ATA( I, J»*CDATA(I,KJ 
40  CONTINJE 
50  CONTINUE 
60  CONTINUE 

SET  THE  REAL  VALUE  0<=  THE  NUMBER  OF  POINTS  CONSIDERED. 

V.NUMCOR 


C 

C  COMPUTE  THE  MEANS  AND  SAMPLE  VARIANCES  OF  X,Y,Z...  OF 
C  (X,Y,Z,...)< 

c 

00  70  M* 1, IDIM 
AVR(M)*SUMXI(M»  H 

VARtM).<  SUMXYtM,  M)  /V-AVR  ( .1)  *«2 . 1*4/ (  V-l.  ) 

70  CONTINUE 
C 

C  COMPUTE  THE  CO  RRe  L A  T I  ON  Or  X(L)  AND  X(K). 

C 

DO  <J0  L-l,  IDIM 
KK  *  L 

DO  80  <*K<,IDIM 

C0RR(L.K)*3UNXY(L.K)-SUMXICL)*SUMXHK)/V 
CORRtL.KI.CORRtL.H)/ ( C  W- 1. ) * ( V AR ( L > *  VAR ( K ) )**.5) 
80  CONTINJE 
<90  CONTINJE 
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THE  FOLLOWING  ARE  ALL  aios  iN  WRITING  THE  INFORMATION  in 

AN  acceptable  fashion. 

T  c  MP*  FLO  AT ( 1DIN) 

NTO*INT(TEMP/2.*9.  ) - 5 

WR  ITEt^ZOOO)  (LINEll  J)  ,J»1.NTC>,  (LINE2(1),I«a,18  > 
WRITS <4, 2010) CLINE (<  ) ,  K- I,  1 0 1 M  ) 

DO  100  1-1*1011 
IF ( 1.63.  IDIM)GO  TO  110 
K  *  I  ♦  1 

WRITE  (6*  20'2O)  I*  (  C  I RR  (  J  » II  »  J  •  1 .  I) .  (  C  QRR  1 1  *  J  I »  J- K*  ID  1 1 ) 
100  CONTINUE 

110  WRITE (4, 20 20 10  IN, !C0«R(J,I0IM),J-1,IDIN> 

WRITE <4.  20  30) (LINE (K> , K-l, 10  IN) 

WRITE (4. 20  AO  I (AVRII) , 1-1, 1011) 

WRITE (6, 20  50 (WAR! J).J«1,  IDIrt) 

2000  FORMA  T( 1  HO , 53  A l ) 

2010  FORMAT! /»9X»A2*  7<  7X*  A2)) 

2020  FORMAT!//, IX, IX',I1,3X,8(F8»5, IX)) 

2030  FORMAT! / // , 7X, 8! 10X, A?  I ) 

20 AO  FORMAT!/, 2X,' MEAN* ,3X,3(1X,F11. 31) 

20  50  FORMAT! /  *2X** WAR*. AX*3(1X,F11. 3)) 

RETURN 

END 

C 

C 

C 

C  THIS  SUBROUTINE  CALCULATES  THE  EUCLIDEAN  DISTANCE 
C  SOUAREO  BETWEEN  kCcNTR  AND  ALL  OTHER  POINTS. 
r 

SUBROUTINE  EJCLIO! 00  AT  A, NU10AT,  IOIM,KCEWTP ) 


0I1ENSI0N  OOATAI 1000, 10) 

IOIST-IOIM+1 
DO  20  1*1, NU10AT 
0S3UAR-O. 

03  10  L-l.IOIM 

Oi  )UAR»DSOOAR-(  OOATA! J , L  > -00  AT  A ! KC E N TR , L  )  )**2. 
10  CONTIhJt 

OOATA! J,  10  1ST) " 0  S  0  UAR 
20  CONTINUE 
RETURN 
ENO 
C 
C 


C 

C  THIS  SUBROUTINE  RESCALES  THE  DATA  AND  PSEUDO  DATA 
C  BACK  TO  ITS  ORIGINAL  MAGNITUDE. 

C 

SUBROUTINE  RSSCAL(R0ATA,NUM3ER,IDIM,ZMIN,ZMAX> 


DIMENSION  RDATA! 1300. 10) , MIN  (10), 2 MAX (1C  I 
DO  20  J*l» NUMBER 
DO  10  <■ 1*  ID1 1 

ROATA(J,K)*ROATA(J,<)*(ZNAX(<)-ZMIN(K))»ZNIN(K) 
10  CONTINUE 
2C-  CONTINUE 
RETURN 
ENO 
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HIS  ROUTINE  t»«£a4^ES  FOR  THE  CALLING  OF  AN  MSL  PLOT  ROUTINE  S3 
THAT  ALL  RE  2  UE  S  TE  0  2-0  PLOTS  CAN  BE  MADE. 


SUBROUTINE  PL OT <  P S EUO 0 » 0  A TA,  L NIN » Z N A  X, NP  LT, NUN  0 A T, NUNGEN , NUHPl T ) 


01  It  NS  I  ON  PSEUDOUOJC.  10  1,0  AT  A  (1CCC.1C  ).  ZNIN  (It )  .ZNAX(IO) 

OMEN  SION  NUN  <  i  I  .  HE  40  <  2  J )  ,  HE  AD1<  22  )  >  IT  I  Tl  E  <1 44  )  »  JT  ITl:  <  1<*4  ) 

DINE  NS  ION  *(1000). Y< 1000 ). IN  AG A ( 51 5i ). ICHAR( 13 ). RANGE! 4) 

3MENSI0N  NPLT(  2.40) 

ESTABLISH  SOME  HOLLERITH  STRINGS  FOR  CLARITT  GF  OJT»UT. 

OATA  NJN/'1'.'2',»3'.'4','5','6','7','B'/ 

DATA  HEAD/ 'O'.  'A' »  '  T».  •  A',  •  '.'P'.'O'.'I'.'N'.'T'.'S'.'  ', 

♦  '  X  '  .  '  ' , •  V .' S •  x  •  / 

OAT  A  HEA01/'®*»'S'»'S*»'U'»'D'*'3'»'  '.'P'.'O'.'I'.'N'.'T'.'S'. 
f  '.'X'.»  '.'  '. 'V '.' S •  ','*•/ 

DO  10  J«l. 144 
ITITLF(J)*'  ' 

JTITLE(J)*'  ' 

10  CJNTINJE 

00  20  Ml.  20 
L*  H  34 

ITITLE(L)«lHEAO(I) 

20  CONTINJE 

03  30  Ml. 22 
L* J '33 

J  T  r  TL  E  < Ll«IH£401(J) 

30  CONTINJE 

I T I TL  El  47)  •'*' 

JTITLEt  97) *'X' 

ITITLE<124)«'X> 

JTITLE(12SM'X' 

SET  VARIABLE  VALUES  ASKED  POR  BY  IHSL  ROUTINE 

N«  l 

INC  *  l 
I0PH1 

BEGIN  LQO»  WHICH  PLOTS  FIRST  THE  DATA  aND  THEN  THE  P  SEL03  OATA  ON  TWO 
SEPERATE  PLOTS  FOR  A  GIVEN  X  (  I  )  AND  XU) 

00  60  Ml.NUHPLT 
C 

C  K  AND  L  ARE  THE  XU)  AND  <<*)  COORDINATES  RESPECTIVELY 
C 


K*NPLT(1»J  ) 

L»NPLT( 2.J  ) 

C 

C  ESTABLISH  THE  ENO  »3MTS  OF  THE  X  AND  Y  AXES. 

C 

RANGE(1)*2NIN(L)-.10*(ZHAX(L)-ZHIN(L)) 
RAHGS<2HZ‘*AX<LU.10*<ZHAX(L)-ZNIN<L>> 
RANGE(3)-7NIN(K)-.10*(ZNAXCK)-ZNIN(K)) 
RANGE (4) «2NAX(K  )♦. 10* ( ZN  A  X  ( K  )  -  ZH  IN  (  <  )  ) 

c 

C  ASSIGN  THE  X  ANO  Y  VECTORS  VALUES  C0R  PLOTTING 
C 

03  40  JMl.NJNDAT 
X( J I ) -DATA ( JI.L  ) 

Y ( J I l*OATA(J£»<) 

40  CONTINJE 
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c 

C  DOCTOR  THE  HEADING  FOR  A  DATA  PLOT  AND  SET  THE  OUTPUT  CHARACTER 
C 

IT  I TL  £ ( A  8 1 *NUN ( K  ) 

IT  I TL  £ ( 5  5 ) «NUN ( L  ) 

ITITLEI90) «NUN ( L ) 

ITITLEI 12’ >*NUH( <) 

ICHARI1 1 O' 

C 

C  CALL  THE  INS  L  ROUTINE  «HICH  WILL  GIVE  SACK  A  2-0  PLOT 
C 

I A*NUN3AT 

CALL  USPLT(X,Y, I  A. IA.i.INC# ITI TLE,R ANGE, ICHAR, 13  PT, I NAG A »  IE  R  > 

C 

C  ASSIGN  THE  X  ANO  T  VECTORS  VALUES  FOB  PLOTTING 
C 

DO  50  J2-l,NUNGEN 
Xt J2I*PSEUD0( J2,L> 

Y ( J  2 1 *®  $  £UDO( J  2  »  K 1 
50  CONTINJE 
C 

C  DOCTOR  THE  HEADING  FOR  A  PSEJDOOATA  PLOT  AND  SET  THE  OUTPUT  CHARACTER 
C 

JTITLEU9)  -NJN(K) 

JTITLEt  561 «NJN< L  I 
JTITlE(98)»NUN(L1 
JTITLE<1271*NJKK) 

ICHAR(11*'P' 

IA-N'JMGEN 

C 

C  CALL  THE  IIS  L  ROUTINE  WHICH  WILL  GIVE  BACK  A  2-0  PLOT 
C 

CALL  USPLT»X,Y,IA,IA,1,INC,JTITLE,RANGE, ICHAR, IOPT, IHAG4,IER> 

60  CONTINJE 
RETURN 
END 
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